Customizing Vector Instruction Set Architectures

نویسندگان

  • Cătălin Bogdan CIOBANU
  • Bogdan CIOBANU
چکیده

Data Level Parallelism(DLP) can be exploited in order to improve the performance of processors for certain workload types. There are two main application fields that rely on DLP, multimedia and scientific computing. Most of the existing multimedia vector extensions use sub-word parallelism and wide data paths for processing independent, mainly integer, values in parallel. On the other hand, classic vector supercomputers rely on efficient processing of large arrays of floating point numbers typically found in scientific applications. In both cases, the selection of an appropriate instruction set architecture(ISA) is crucial in exploiting the potential DLP to gain high performance. The main objective of this thesis is to develop a methodology for synthesizing customized vector ISAs for various application domains targeting high performance program execution. In order to accomplish this objective, a number of applications from the telecommunication and linear algebra domains have been studied, and custom vector instructions sets have been synthesized. Three algorithms that compute the shortest paths in a directed graph (Dijkstra, Floyd and Bellman-Ford) have been analyzed, along with the widely used Linpack floating point benchmark. The framework used to customize the ISAs included the use of the Gnu C Compiler versions 4.1.2 and 2.7.2.3 and the SimpleScalar-3.0d tool set extended to simulate customized vector units. The modifications applied to the simulator include the addition of a vector register file, vector functional units and specific vector instructions. The main results of this thesis can be summarized as follows: overall applications speedups of 24.88X for Dijkstra (after both code optimization and vectorization), 4.99X for Floyd, 9.27X for Bellman-Ford and 4.33X for the C version of Linpack. The above results suggest a consistent improvement in execution times due to the customized vector instruction sets. Abstract Data Level Parallelism(DLP) can be exploited in order to improve the performance of processors for certain workload types. There are two main application fields that rely on DLP, multimedia and scientific computing. Most of the existing multimedia vector extensions use sub-word parallelism and wide data paths for processing independent, mainly integer, values in parallel. On the other hand, classic vector supercomputers rely on efficient processing of large arrays of floating point numbers typically found in scientific applications. In both cases, the selection of an appropriate instruction set architecture(ISA) is crucial in exploiting the potential DLP to gain high performance. The main objective of this thesis is to develop a methodology for synthesizing customized vector …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Customization of an embedded RISC CPU with SIMD extensions for video encoding: A case study

This work presents a detailed case study in customizing a configurable, extensible, 32-bit RISC processor with vector/SIMD instruction extensions for the efficient execution of block-based video-coding algorithms utilizing a proprietary co-design environment. In addition to the default Full-Search motion estimation of the MPEG-2 Test Model 5, fourteen fast ME algorithms were implemented in both...

متن کامل

Customizing the Datapath and ISA of Soft VLIW Processors

In this paper, we examine the trade-offs in performance and area due to customizing the datapath and instruction set architecture of a soft VLIW processor implemented in a high-density FPGA. In addition to describing our processor, we describe a number of microarchitectural optimizations we used to reduce the area of the datapath. We also describe the tools we developed to customize, generate, ...

متن کامل

A Comparison Between Processor Architectures for Multimedia Applications

The efficient processing of MultiMedia Applications (MMAs) is currently one of the main bottlenecks in the media processing field. Many architectures have been proposed for processing MMAs such as VLIW, superscalar (general-purpose processor enhanced with a multimedia extension such as MMX), vector architectures, SIMD architectures, and reconfigurable computing devices. The question then arises...

متن کامل

Simple ASIC Complex ASIC RaPiD FPGA GARP DPGA SuperSpeculative RAW TRACE ( Multiscalar ) SMT VECTOR

Poor scalability of Superscalar architectures with increasing instruction-level parallelism (ilp) has resulted in a trend towards statically scheduled horizontal architectures such as Very Large Instruction Word (vliw) processors and their more sophisticated successors called Explicitly Parallel Instruction Computing (epic) architectures. We extend the epic model with additional capabilities to...

متن کامل

For Embedded Applications with Data-level Parallelism, a Vector Processor Offers High Performance at Low Power Consumption and Low Design Complexity. unlike Superscalar and Vliw Designs, a Vector Processor Is Scalable and Can Optimally Match Specific

Designers of embedded processors have typically optimized for low power consumption and low design complexity to minimize cost. Performance was a secondary consideration. Nowadays, many embedded systems (set-top boxes, game consoles, personal digital assistants, and cell phones) commonly perform computation-intensive media tasks such as video processing, speech transcoding, graphics, and high-b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007